24 research outputs found

    A fragmentising interface to a large corpus of digitized text: (Post)humanism and non-consumptive reading via features

    Get PDF
    While the idea of distant reading does not rule out the possibility of close reading of the individual components of the corpus of digitized text that is being distant-read, this ceases to be the case when parts of the corpus are, for reasons relating to intellectual property, not accessible for consumption through downloading followed by close reading. Copyright restrictions on material in collections of digitized text such as the HathiTrust Digital Library (HTDL) necessitates providing facilities for non-consumptive reading, one of the approaches to which consists of providing users with features from the text in the form of small fragments of text, instead of the text itself. We argue that, contrary to expectation, the fragmentary quality of the features generated by the reading interface does not necessarily imply that the mode of reading enabled and mediated by these features points in an anti-humanist direction. We pose the fragmentariness of the features as paradigmatic of the fragmentation with which digital techniques tend, more generally, to trouble the humanities. We then generalize our argument to put our work on feature-based non-consumptive reading in dialogue with contemporary debates that are currently taking place in philosophy and in cultural theory and criticism about posthumanism and agency. While the locus of agency in such a non-consumptive practice of reading does not coincide with the customary figure of the singular human subject as reader, it is possible to accommodate this fragmentising practice within the terms of an ampler notion of agency imagined as dispersed across an entire technosocial ensemble. When grasped in this way, such a practice of reading may be considered posthumanist but not necessarily antihumanist.Ope

    When the Elevator Pitch Meets the Subject Heading: How Mixtures of Other Documents Can Describe What a Document is About.

    Get PDF
    We explore the concept of mixture descriptions. Commonly used in film reviews, they describe a film in terms of a combination of two or more other films. This very concrete approach to description can be contrasted with the abstractions typically used in subject headings or the names of genres. By exploring a dataset of film reviews, we uncover some of the features of mixture descriptions as they are used colloquially and investigate when and how they may prove useful. This form of description through combination is not specific to film, and we look at its potential as a bottom-up, ludic form of document description.Ope

    Design Facets of Crowdsourcing

    Get PDF
    Crowdsourcing offers a way for information scientists to engage with the public and potentially collect valuable new data about documents. However, the space of crowdsourcing is very broad, with many design choices that differentiate existing projects significantly. This can make the space of crowdsourcing rather daunting. Building upon efforts from other fields to conceptualize, we develop a typology of crowdsourcing for information science. Through a number of dimensions within the scope of motivation, centrality, beneficiary, aggregation, type of work, and type of crowd, our typology provides a way to understand crowdsourcing.ye

    Characterizing Same Work Relationships in Large-Scale Digital Libraries

    Get PDF
    As digital libraries grow, they are prompting new consideration into same-work relationships. They provide unique opportunities for resource discovery, but their scale and aggregated models lead to challenges presented by duplicates and variants. Addressing this problem is complicated by metadata inconsistencies as well as structural/content differences. Following from work in algorithmically identifying duplicate works in the HathiTrust Digital Library, we present some cases that complicate our existing language for work entity relationships. These serve to contextualize the complexities of same-work alignment in digital libraries, ground future discussion around content similarity, and inform methods to better identify duplicates in large-scale digital libraries

    Matching and Grokking: Approaches to Personalized Crowdsourcing

    Get PDF
    Personalization in computing helps tailor content to a person’s individual tastes. As a result, the tasks that benefit from personalization are inherently subjective. Many of the most robust approaches to personalization rely on large sets of other people’s preferences. However, existing preference data is not always available. In these cases we propose leveraging online crowds to provide on-demand personalization. We introduce and evaluate two methods for personalized crowdsourcing: taste-matching for finding crowd workers that are similar to a personalization target, and taste-grokking, where crowd workers explicitly predict the requester’s tastes. Both approaches show improvement over a non-personalized baseline, and have various benefits and drawbacks that are discussed

    Personalized human computation

    Get PDF
    Significant effort in machine learning and information retrieval has been devoted to identifying personalized content such as recommendations and search results. Personalized human computation has the potential to go beyond existing techniques like collaborative filtering to provide personalized results on demand, over personal data, and for complex tasks. This work-in-progress compares two approaches to personalized human computation. In both, users annotate a small set of training examples which are then used by the crowd to annotate unseen items. In the first approach, which we call taste-matching, crowd members are asked to annotate the same set of training examples, and the ratings of similar users on other items are then used to infer personalized ratings. In the second approach, taste-grokking, the crowd is presented with the training examples and asked to use them predict the ratings of the target user on other items

    Access to billions of pages for large-scale text analysis

    Get PDF
    Consortial collections have led to unprecedented scales of digitized corpora, but the insights that they enable are hampered by the complexities of access, particularly to in-copyright or orphan works. Pursuing a principle of non-consumptive access, we developed the Extracted Features (EF) dataset, a dataset of quantitative counts for every page of nearly 5 million scanned books. The EF includes unigram counts, part of speech tagging, header and footer extraction, counts of characters at both sides of the page, and more. Distributing book data with features already extracted saves resource costs associated with large-scale text use, improves the reproducibility of research done on the dataset, and opens the door to datasets on copyrighted books. We describe the coverage of the dataset and demonstrate its useful application through duplicate book alignment and identification of their cleanest scans, topic modeling, word list expansion, and multifaceted visualization.Ope

    Measuring flexibility: A text-mining approach

    Get PDF
    In creativity research, ideational flexibility, the ability to generate ideas by shifting between concepts, has long been the focus of investigation. However, psychometric work to develop measurement procedures for flexibility has generally lagged behind other creativity-relevant constructs such as fluency and originality. Here, we build from extant research to theoretically posit, and then empirically validate, a text-mining based method for measuring flexibility in verbal divergent thinking (DT) responses. The empirical validation of this method is accomplished in two studies. In the first study, we use the verbal form of the Torrance Test of Creative Thinking (TTCT) to demonstrate that our novel flexibility scoring method strongly and positively correlates with traditionally used TTCT flexibility scores. In the second study, we conduct a confirmatory factor analysis using the Alternate Uses Task to show reliability and construct validity of our text-mining based flexibility scoring. In addition, we also examine the relationship between personality facets and flexibility of ideas to provide criterion validity of our scoring methodology. Given the psychometric evidence presented here and the practicality of automated scores, we recommend adopting this new method which provides a less labor-intensive and less costly objective measurement of flexibility

    Design jams in iSchools: Approaches, challenges and examples

    Get PDF
    Through a live demonstration, we will showcase a group of focused design techniques known collectively as a Design Jam. Design jams are about looking at a particular design challenge and thinking-by doing. Although they often have a component of brainstorming, they involve additional activities, including paper prototyping, and storytelling with personas and scenarios. After the design jam, we will share experiences of teaching design techniques in ischools.published or submitted for publicationis peer reviewe
    corecore